System calls for using TCP 


Client Server 


socket — make socket 
bind — assign address 
listen — listen for clients 
socket — make socket 
bind — assign address (optional) 
connect — connect to listening socket 
accept — accept connection 
write —send data read — receive data 


read — receive data write — send data 


e Anything red might block, waiting for network 


- Obviously bad for applications that need concurrency 
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Non-blocking I/O 


e Use fcnt1 to set O_NONBLOCK flag on descriptor 


e Non-blocking semantics of system calls: 


read immediately returns -1 with errno EAGAIN if no data 
write may not write all data, or may return EAGAIN 


connect may “fail” with EINPROGRESS (or may succeed, or may 
fail with real error like ECONNREFUSED) 


accept may fail with EAGAIN if no pending connections 
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How to know when to read/write? 


struct pollfd { 
int fd; 


short 


short revents; 


le 


events; 


/* file descriptor */ 
/* Events you are interested in */ 
/* Events that have happened (results) */ 


int poll(struct pollfd *fds, nfds_t nfds, int timeout); 


/* Some possible events: */ 


#define 
#define 
#define 
#define 


POLLIN 

POLLOUT 
POLLERR 
POLLHUP 


Ox0001 
0x0004 
0x0008 
0x0010 


/* 
/* 
/* 
/* 


Can read fd without blocking */ 
Can write fd without blocking */ 
Error on fd (only in revents) */ 
‘‘Hangup’’ has occurred on fd */ 


e Note: BSD used select to achieve same thing 


- Most OSes support both select and poll today 
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epoll 


Newer Linux provides epol1 
Interface allows more efficient implementation 

- Register interest with epoll_ctl syscall 

- Wait with epoll_wait syscall 

- Kernel doesn’t have to re-scan pollfd array on each wait 
New option bits reduce calls to epoll_ctl 

- EPOLLONESHOT -— only wait for event once 

- EPOLLET — “edge triggered” (as opposed to level triggered) 
epoll is Linux specific 

- But BSD has kqueue/kevent which is similar idea 
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epoll interface 


typedef union epoll_data { 
int fd; 
/* 12. */ 

} epoll_data_t; 


struct epoll_event { 
__uint32_t events; /* Epoll events */ 
epoll_data_t data; /* User data variable */ 


int epoll_create(int size) ; 

int epoll_ctl(int epfd, int op, int fd, 
struct epoll_event *event) ; 

int epoll_wait(int epfd, struct epoll_event *events, 
int maxevents, int timeout) ; 


— p. 5/24 


Asynchronous programming model 


e Many non-blocking file descriptors in one process 
- Wait for pending I/O events on file many descriptors 


- Each event triggers some callback function 
e E.g., build “callback harness”: 


/* Register callback for when fd is readable or writable */ 
void cb_add (int fd, int write, void (*fn) (void *), void *arg); 


/* Unregister callback */ 
void cb_free (int fd, int write); 


/* Loop forever checking callbacks */ 
void ch. check (void); 
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Simplified example 
struct state { 
int fd; 
/* 1... */ 
F 


void doit (void) 1 
struct state *st = malloc (sizeof (*st)); 
st->fd = create_new_tcp_socket (); 


connect (st->fd, &someplace, sizeof (someplace)); 


cb_add (st->fd, 1, doit_2, st); 
} 
static void doit_2 (void *_st) { 
struct state *st = _st; 
write (st->fd, "request\n", 8); 
cb_free (st->fd, 1); 
cb_add (st->fd, 0, doit_3, st); 
} 
static void doit_3 (void *_st) { 
struct state *st = _st; 


/* read more from st->fd until you get full response */ 


t 
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Syntactic sugar 


e Problem: Need state from one callback to next 
e E.g., C++ can implement wrap that bundles a function 


with its arguments 


callback<void, int>::ref errwrite = wrap (write, 2); 
(xerrwrite) ("hello", 5); // calls write (2, "hello", 5); 


e Possible to build large event-driven apps this way 
- E.g., [have built large library to do this 


- Debugging features include recording where callbacks created 
to facilitate tracing 


e Google reportedly does similar things 
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Intro to Threads 


e Threads: most popular abstraction for concurrency 
- Lighter-weight abstraction than processes 
- All threads in one process have same memory, file desc., etc. 
- Allows one process to use multiple CPUs 

e Example: threaded web server: 


- Service many clients simultaneously 


for (4). 4 
fd = accept_client (); 


thread_create (service_client, &fd); 
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How to share CPU amongst threads 


e Each thread has execution state: 
- Stack, program counter, registers, condition codes, etc. 
e Switch the CPU amongst the threads 
- Save away execution state of one, load up that of next 
e When to switch? 
- Current thread can no longer use the CPU (waiting for I/O) 
- Current thread has had CPU for too long (preemption) 


- Scheduler maintains lists of runnable/running /waiting threads 
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Thread package API 


e tid create (void (*fn) (void *), void *arg) ; 
- Create a new thread, run fn with arg 

e void exit (); 
- Destroy current thread 

e void join (tid thread); 
- Wait for thread thread to exit 
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Synchronization primitives 


void lock (mutex_t m); 
void unlock (mutex_t m); 


- Only one thread acuires m at a time, others wait 

- All global data must be protected by a mutex! 
void wait (mutex_t m, cond_t c); 

- Atomically unlock mand sleep until c signaled 
void signal (cond_t c); 
void broadcast (cond_t c); 


- Wake one/all users waiting on c 
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Example: Taking job from work queue 


job *job_queue; 
mutex_t job_mutex; 
cond_t job_cond; 
void workthread (void *) { 
job *j; 
fot Gz) d 
lock (job_mutex) ; 
while (!(j = job_queue) ) 
wait (job_mutex, job_cond) ; 
job_queue = j->next; 
unlock (job_mutex) ; 
do (j); 
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Example: Adding job to work queue 


void addjob (job *j) 1 
lock (job_mutex) ; 
j->next = job_queue; 
job_queue = j; 
signal (job_cond) ; 
unlock (job_mutex) ; 


} 


e Atomic release/wait necessary in workthread, 
otherwise: 
- workthread checks queue, releases lock 
- addjob adds job to queue, signals job_mutex 


- workthread waits for signal that was already delivered 
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Other thread package features 


Alerts — cause exception in a thread 

Trylock - don’t block if can’t acquire mutex 
Timedwait — timeout on condition variable 
Shared locks — concurrent read accesses to data 
Thread priorities — control scheduling policy 


Thread-specific global data 
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Implementing shared locks 


struct sharedlk { 
int 1; mutex_t m; cond_t C: 

rs 

void AcquireExclusive (sharedlk *sl) { 
lock (sl->m) ; 
while (sl->i) { wait (sl->m, sl->c); } 
sl->i = =1; 
unlock (sl->m); 

} 

void AcquireShared (sharedlk *sl) { 
lock (sl->m) ; 
while (sl->i < 0) { wait (sl->m, sl->c); } 
sl->it+; 


unlock (sl->m) ; 
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shared locks (continued) 


void ReleaseShared (sharedlk *sl) { 
lock (sl->m) ; 
if (!--sl->i) signal (sl->c); 
unlock (sl->m) ; 
h 
void ReleaseExclusive (sharedlk *sl) { 
lock (sl->m) ; 
sl->i = U: 
broadcast (sl->c); 
unlock (sl->m) ; 


t 


e Must deal with starvation 
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Deadlock 


e Mutex ordering: 
- A locks m1, B locks m2, A locks m2, B locks m1 
- How to avoid? 
e Similar deadlock with condition variables 
- Suppose resource 1 managed by c1, resource 2 by C2 


- A has 1, waits on c2, B has 2, waits on cl 


e Mutex/condition variable deadlock: 


- lock (a); lock (b); while (!ready) wait (b, c); 
unlock (b); unlock (a); 

- lock (a); lock (b); ready = true; signal (c); 
unlock (b); unlock (a); 


Moral: Bad to hold locks when crossing abstraction barriers! 
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Data races 


e Example: modify global ++x without mutex 
- Might compile to: load, add 1, store 
- Bad interleaving changes result: load, load, ... 

e Even single instructions can have races 
- E.g.,addl $1,_x 


- Not atomic on MP without lock prefix! 
e Even reads dangerous on some architectures 


e But sometimes cheating buys efficiency 


if ('!initialized) { 
lock (m); 
if ('initialized) { initialize (); initialized = 1; } 
unlock (m); 


} 
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Implementing user-level threads 


e Allocate a new stack for reach thread create 
e Keep a queue of runnable threads 


e Replace networking system calls (read/write/etc.) 
- If operation would block, switch and run different thread 
e Schedule periodic timer signal (setitimer) 


- Switch to another thread on timer signals (preemption) 
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Example 


e Per-thread state in thread control block structure 


typedef struct tcb { 
unsigned long md_esp; 
char *t_stack; 
/* ... */ 

}; 


/* Stack pointer of thread */ 
/* Bottom of thread’s stack */ 


e Machine-dependent thread-switch function: 


- void thread_md_switch (tcb *current, tcb *next); 


e Machine-dependent thread initialization function: 
- void thread_md_init (tcb *t, 


void (*fn) (void *), void *arg); 
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1386 thread md switch 


pushl %ebp; movl esp, Lebp # Save frame pointer 


pushl %ebx; pushl esi: pushl edi # Save callee-saved regs 


movl 
movl 
movl 


movl 


popl 
popl 
ret 


8 (ebp) ,%edx # hedx = thread_current 
12Chebp) , Leax # eax = thread_next 
hesp, (hedx) # edx->md_esp = esp 
(eax) , esp # esp = ,eax->md_esp 
hedi; popl esi; popl %ebx # Restore callee saved regs 
hebp # Restore frame pointer 
# Resume execution 
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1386 thread md _ init 


void thread_md_init (tcb *t, void (*fn) (void *), void *arg) { 
u_long *sp = (u_long *) (t->t_stack + thread_stack_size) ; 


/* Set up a callframe to thread_begin */ 
*--sp = (u_long) arg; *--sp = (u_long) fn; 
*--sp = (u_long) t; *--sp = 0; /* No return address */ 


/* Now set up saved registers for switch.S */ 

*--sp = (u_long) thread_begin; /* return address */ 
*--sp = 0; /* ebp */ *--sp = 0; /* ebx */ 

O; /* esi */ *--sp = 0; /* edi */ 


*--Sp 


t->t_md.md_esp = (mdreg_t) sp; 


e Swich will call thread_begin (fn, arg); 
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Implementing kernel level threads 


e Start with process abstraction in kernel 


e Strip out unnecessary features 
- Same address space 
- Same file table 


- (Plan9’s rfork actually allows individual control) 


e Faster than a process, but still very heavy weight 
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